1 PLEURA cellranger QC

1.1 Objective

We are going to perform a quality control (QC) analysis of the mapping results obtained by running cellranger version 7.0.0.

We will pull together all the libraries from all the PLEURA subprojects.

## [1] "CSF_01"
## [1] "4608"
## [1] "4839"
## [1] "5700"
## [1] "5792"
## [1] "5929"
## [1] "CSF_02"
## [1] "7921"
## [1] "7974"
## [1] "CSF_03"
## [1] "3054"
## [1] "3087"
## [1] "3887"
## [1] "8102"

2 Gene Expression QC

We will start by showing the three most relevant metrics (number of reads, estimated number of recovered cells, fraction of reads in cells, mean reads per cell, fraction of reads mapped to exonic reads, and median genes per cell) obtained by cellranger for each of the working libraries. This information will give us an idea of the quality of the experiment as well as the sequencing and the mapping steps.

GEX QC metrics
cellranger v 7.0.0
Subproject GemID Cells Median UMI counts per cell Median genes per cell Median reads per cell Total genes detected Number of reads
CSF_01 4608 3984 1623 858 19472 22942 266.97M
CSF_01 4839 3504 2964 1303 41738 22837 306.87M
CSF_01 5700 8989 2310 1049 16548 23935 298.90M
CSF_01 5792 7602 3659 1517 22742 25079 303.81M
CSF_01 5929 6962 3095 1319 16182 24309 261.68M
CSF_02 7921 4813 3998 1615 30334 23910 315.94M
CSF_02 7974 10498 4237 1825 31747 26802 610.99M
CSF_03 3054 1264 2696 1246 65543 22785 297.91M
CSF_03 3087 4294 5740 1968 43336 25201 305.29M
CSF_03 3887 2480 3094 1292 28178 26006 257.35M
CSF_03 8102 9637 2641 1030 19239 24883 360.44M

2.1 Mapping QC

Next, we will check the quality of the mapping step performed by cellranger 7.0.0 across libraries. To do so, we will compare the percentage of reads mapped to the genome, and within these mapped reads, the amount of reads mapped to intergenic regions, intronic and exonic regions. We aim to obtain libraries with a high percentage of confidently mapped reads, and specially a high percentage of exonic reads, which correspond with gene expression or RNAs. The reads mapping to intergenic regions suggest contamination of ambient DNA, whereas reads mapping to intronic regions may come from pre-mRNAs or mature spliced isoforms that retain certain introns.

## [1] "Confidently_mapped_to_genome"
## [1] "Confidently_mapped_to_intergenic_regions"
## [1] "Confidently_mapped_to_intronic_regions"
## [1] "Confidently_mapped_to_exonic_regions"
## [1] "Confidently_mapped_antisense"
## [1] "Confidently_mapped_to_transcriptome"

2.2 Sequencing saturation and depth

After assessing mapped reads, it is important to test which is the sequencing saturation and depth for each library. The sequencing saturation is dependent on the library complexity and sequencing depth. The library complexity is the total number of different transcripts present in the library and it varies between the cell types/tissues, whereas the sequencing depth is the number of paired reads per cell. For this reason, we will plot the number of detected genes as a function of depth (sequenced reads). As sequencing depth increases, more genes are detected, but this function reaches a plateau, whereby more sequenced reads does not result in more detected genes; therefore, at this point we assure we sequenced until saturation. More specifically, the sequencing saturation the fraction of confidently mapped, valid cell-barcode, valid UMI reads that had a non-unique (cell-barcode, UMI, gene).

3 VDJ-T QC

We will start by showing the three most relevant metrics (number of reads, estimated number of recovered cells, fraction of reads in cells, mean reads per cell, fraction of reads mapped to any V(D)J gene, and cells with productive V-J Spanning Pair) obtained by cellranger for each of the working libraries. This information will give us an idea of the quality of the experiment as well as the sequencing and the mapping steps.

VDJ-T QC metrics
cellranger v 7.0.0
Subproject GemID Number of reads Estimated number of cells Fraction reads in cells Mean reads per cell Reads mapped to any V D J gene Cells with productive V-J spanning pair
CSF_01 4608 13071011 815 34.80 0.02M 38.45 58.65
CSF_01 4839 15751069 1912 77.62 0.01M 72.19 77.62
CSF_01 5700 14127176 5719 62.66 0.00M 51.65 64.84
CSF_01 5792 18090339 2729 43.33 0.01M 31.17 69.11
CSF_01 5929 14573775 4297 52.51 0.00M 82.88 80.78
CSF_02 7921 19365195 3254 84.28 0.01M 77.96 82.27
CSF_02 7974 23961188 5087 83.40 0.00M 83.33 81.80

3.1 Mapping QC

Next, we will check the quality of the V(D)J mapping step performed by cellranger 7.0.0 across libraries. To do so, we will compare the percentage of reads mapped to any germline V(D)J gene segment, and within these mapped reads, the amount of reads mapped TRA and TRB germline gene segment.

3.2 V(D)J Expression

Here, we will assess the median number of UMIs assigned to a TRA/TRB contig per cell. Low values for any of the two parameters can indicate cells with extremely low TRA/TRB expression or poor cell quality, among others.

VDJ-T expression
cellranger v 7.0.0
GemID Median_TRA_UMIs_per_Cell Median_TRB_UMIs_per_Cell
4608 2 4
4839 3 7
5700 3 8
5792 3 7
5929 5 10
7921 4 10
7974 5 10

3.3 V(D)J Annotation

Now, we will check the V(D)J annotation for the studied samples. To better interpret the obtained results, we will consider the information given in the cellranger web summary file. We will assess the fraction of cell-associated barcodes (with at least…), that are the following ones:

  • Cells With TRA/TRB Contig: one TRA/TRB contig annotated as a full or partial V(D)J gene.

  • Cells With CDR3-annotated TRA/TRB Contig: one TRA/TRB contig where a CDR3 was detected.

  • Cells With Productive TRA/TRB Contig: one contig that spans the 5’ end of the V region to the 3’ end of the J region for TRA/TRB, has a start codon in the expected part of the V sequence, has an in-frame CDR3, and has no stop codons in the aligned V-J region.

  • Cells With Productive V-J Spanning Pair: one productive contig for each chain of the receptor pair. As well as the correspondent the number of cells with productive V-J Spanning Pair.

For all thre previous parameters, low values can indicate poor cell quality, low yield from the RT reaction, poor specificity of the V(D)J enrichment. Moreover, we will also check:

  • Paired Clonotype Diversit Effective diversity of the paired clonotypes. It is computed as the Inverse Simpson Index of the clonotype frequencies. A value of 1 indicates a minimally diverse sample - only one distinct clonotype was detected, whereas a value equal to the estimated number of cells indicates a maximally diverse sample.
V(D)J annotation
cellranger v6.0.1
GEM ID Estimated Number of Recovered Cells Productive V-J Spanning Pair Paired Clonotype Diversity Productive contig
Fraction Cells TRA TRB
4608 815 58.65 478 109.05 65.28 93.37
4839 1912 77.62 1484 457.31 80.65 96.97
5700 5719 64.84 3708 3103.42 68.86 95.98
5792 2729 69.11 1886 998.72 72.63 96.48
5929 4297 80.78 3471 648.62 83.17 97.60
7921 3254 82.27 2677 1905.78 84.97 97.30
7974 5087 81.80 4161 748.75 84.18 97.62

4 VDJ-B QC

We will start by showing the three most relevant metrics (number of reads, estimated number of recovered cells, fraction of reads in cells, mean reads per cell, fraction of reads mapped to any V(D)J gene, and cells with productive V-J Spanning Pair) obtained by cellranger for each of the working libraries. This information will give us an idea of the quality of the experiment as well as the sequencing and the mapping steps.

BCR-V(D)J QC metrics
cellranger v6.0.1
GEM ID Number of Reads Estimated Number of Recovered Cells Fraction of Reads in Cells Mean Reads per Cell Fraction of Reads Mapped to any VDJ gene Cells With Productive V-J Spanning Pair
4608 280.04M 2 0.0% 140021911 2.2% 0
4839 13.55M 13 48.3% 1042457 47.6% 13
5929 13.87M 414 92.0% 33499 90.5% 387
7921 18.35M 134 16.9% 136953 16.8% 125
7974 19.85M 1852 88.1% 10717 88.5% 1311

4.1 Mapping QC

Next, we will check the quality of the V(D)J mapping step performed by cellranger 7.0.0 across libraries. To do so, we will compare the percentage of reads mapped to any germline V(D)J gene segment, and within these mapped reads, the amount of reads mapped IGH, IGK and IGL germline gene segment.

4.2 V(D)J Expression

Here, we will assess the median number of UMIs assigned to a IGH/IGK/IGL contig per cell. Low values for any of the three parameters can indicate cells with extremely low IGH/IGK/IGL expression or poor cell quality, among others.

V(D)J expression
cellranger v6.0.1
GEM ID Median IGH UMIs per Cell Median IGK UMIs per Cell Median IGL UMIs per Cell
4608 6.0 21 NA
4839 1995.0 7406 85
5929 2185.5 4343 5830
7921 10.5 26 30
7974 12.0 29 27

4.3 V(D)J Annotation

Now, we will check the V(D)J annotation for the studied samples. To better interpret the obtained results, we will consider the information given in the cellranger web summary file. We will assess the fraction of cell-associated barcodes (with at least…), that are the following ones:

  • Cells With IGH/IGK/IGL Contig: one IGH/IGK/IGL contig annotated as a full or partial V(D)J gene.

  • Cells With CDR3-annotated IGH/IGK/IGL Contig: one IGH/IGK/IGL contig where a CDR3 was detected.

  • Cells With Productive IGH/IGK/IGL Contig: one contig that spans the 5’ end of the V region to the 3’ end of the J region for IGH/IGK/IGL, has a start codon in the expected part of the V sequence, has an in-frame CDR3, and has no stop codons in the aligned V-J region.

  • Cells With Productive V-J Spanning Pair: one productive contig for each chain of the receptor pair. As well as the correspondent the number of cells with productive V-J Spanning Pair.

For all thre previous parameters, low values can indicate poor cell quality, low yield from the RT reaction, poor specificity of the V(D)J enrichment. Moreover, we will also check:

  • Paired Clonotype Diversit Effective diversity of the paired clonotypes. It is computed as the Inverse Simpson Index of the clonotype frequencies. A value of 1 indicates a minimally diverse sample - only one distinct clonotype was detected, whereas a value equal to the estimated number of cells indicates a maximally diverse sample.
V(D)J annotation
cellranger v6.0.1
GEM ID Productive contig Estimated Number of Recovered Cells Productive V-J Spanning Pair Paired Clonotype Diversity
IGH IGK IGL Fraction IGK IGH Pair Fraction IGL IGH Pair Cells
4608 0.00 100.00 0.00 2 0.00 0.00 0 2.00
4839 100.00 61.54 38.46 13 61.54 38.46 13 13.00
5929 93.48 67.39 33.09 414 63.04 30.92 387 50.95
7921 93.28 57.46 42.54 134 52.99 40.30 125 128.26
7974 71.71 61.02 38.50 1852 44.28 26.94 1311 497.09

5 Data overview

## [1] "Libraries metadata"
##    project subproject gem_id library_id library_name library_barcode    hashing
## 1      CSF     CSF_01   4608     276966     4608_GEX          AZ8142 not_hashed
## 2      CSF     CSF_01   4839     276967     4839_GEX          AZ8143 not_hashed
## 3      CSF     CSF_01   5929     276968     5929_GEX          AZ8144 not_hashed
## 4      CSF     CSF_01   5700     276969     5700_GEX          AZ8145 not_hashed
## 5      CSF     CSF_01   5792     276970     5792_GEX          AZ8146 not_hashed
## 6      CSF     CSF_01   4608     277405     4608_TCR          AZ8390 not_hashed
## 7      CSF     CSF_01   4839     277406     4839_TCR          AZ8391 not_hashed
## 8      CSF     CSF_01   5929     277407     5929_TCR          AZ8392 not_hashed
## 9      CSF     CSF_01   5700     277408     5700_TCR          AZ8393 not_hashed
## 10     CSF     CSF_01   5792     277409     5792_TCR          AZ8394 not_hashed
## 11     CSF     CSF_01   4839     277410     4839_BCR          AZ8396 not_hashed
## 12     CSF     CSF_01   5929     277411     5929_BCR          AZ8397 not_hashed
## 13     CSF     CSF_02   7921     277387     7921_GEX          AZ7864 not_hashed
## 14     CSF     CSF_02   7974     277388     7974_GEX          AZ7865 not_hashed
## 15     CSF     CSF_02   7921     277401     7921_TCR          AZ8125 not_hashed
## 16     CSF     CSF_02   7974     277402     7974_TCR          AZ8126 not_hashed
## 17     CSF     CSF_02   7921     277403     7921_BCR          AZ8127 not_hashed
## 18     CSF     CSF_02   7974     277404     7974_BCR          AZ8128 not_hashed
## 19     CSF     CSF_03   3087     277820     3087_GEX          AZ8561 not_hashed
## 20     CSF     CSF_03   3887     277821     3887_GEX          AZ8562 not_hashed
## 21     CSF     CSF_03   8102     277822     8102_GEX          AZ8563 not_hashed
## 22     CSF     CSF_03   3054     277823     3054_GEX          AZ8564 not_hashed
##     type donor_id  wet_lab
## 1   cDNA 4608_GEX 4608_GEX
## 2   cDNA 4839_GEX 4839_GEX
## 3   cDNA 5929_GEX 5929_GEX
## 4   cDNA 5700_GEX 5700_GEX
## 5   cDNA 5792_GEX 5792_GEX
## 6  VDJ-T 4608_TCR 4608_TCR
## 7  VDJ-T 4839_TCR 4839_TCR
## 8  VDJ-T 5929_TCR 5929_TCR
## 9  VDJ-T 5700_TCR 5700_TCR
## 10 VDJ-T 5792_TCR 5792_TCR
## 11 VDJ-B 4839_BCR 4839_BCR
## 12 VDJ-B 5929_BCR 5929_BCR
## 13  cDNA 7921_GEX 7921_GEX
## 14  cDNA 7974_GEX 7974_GEX
## 15 VDJ-T 7921_TCR 7921_TCR
## 16 VDJ-T 7974_TCR 7974_TCR
## 17 VDJ-B 7921_BCR 7921_BCR
## 18 VDJ-B 7974_BCR 7974_BCR
## 19  cDNA 3087_GEX 3087_GEX
## 20  cDNA 3887_GEX 3887_GEX
## 21  cDNA 8102_GEX 8102_GEX
## 22  cDNA 3054_GEX 3054_GEX
## [1] "GEX QC summary table"
## # A tibble: 11 x 26
##    Subproj~1 GemID Cells Confi~2 Media~3 Media~4 Media~5 Total~6 Numbe~7 Numbe~8
##    <chr>     <chr> <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
##  1 CSF_01    4608   3984    85.8    1623     858   19472   22942  2.67e8       0
##  2 CSF_01    4839   3504    93.3    2964    1303   41738   22837  3.07e8       0
##  3 CSF_01    5700   8989    93.0    2310    1049   16548   23935  2.99e8       0
##  4 CSF_01    5792   7602    91.8    3659    1517   22742   25079  3.04e8       0
##  5 CSF_01    5929   6962    79.4    3095    1319   16182   24309  2.62e8       0
##  6 CSF_02    7921   4813    93.8    3998    1615   30334   23910  3.16e8       0
##  7 CSF_02    7974  10498    89.9    4237    1825   31747   26802  6.11e8       0
##  8 CSF_03    3054   1264    86.2    2696    1246   65543   22785  2.98e8       0
##  9 CSF_03    3087   4294    90.2    5740    1968   43336   25201  3.05e8       0
## 10 CSF_03    3887   2480    84.8    3094    1292   28178   26006  2.57e8       0
## 11 CSF_03    8102   9637    91.0    2641    1030   19239   24883  3.60e8       0
## # ... with 16 more variables: Q30_RNA_read <dbl>, Q30_UMI <dbl>,
## #   Q30_barcodes <dbl>, Confidently_mapped_antisense <dbl>,
## #   Confidently_mapped_to_exonic_regions <dbl>,
## #   Confidently_mapped_to_genome <dbl>,
## #   Confidently_mapped_to_intergenic_regions <dbl>,
## #   Confidently_mapped_to_intronic_regions <dbl>,
## #   Confidently_mapped_to_transcriptome <dbl>, ...
## [1] "VDJ-T QC summary table"
## # A tibble: 7 x 23
##   Subpro~1 GemID Cells~2 Cells~3 Cells~4 Cells~5 Estim~6 Media~7 Media~8 Numbe~9
##   <chr>    <chr>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
## 1 CSF_01   4608     65.3    93.4    58.6    58.6     815       2       4     478
## 2 CSF_01   4839     80.6    97.0    77.6    77.6    1912       3       7    1484
## 3 CSF_01   5700     68.9    96.0    64.8    64.8    5719       3       8    3708
## 4 CSF_01   5792     72.6    96.5    69.1    69.1    2729       3       7    1886
## 5 CSF_01   5929     83.2    97.6    80.8    80.8    4297       5      10    3471
## 6 CSF_02   7921     85.0    97.3    82.3    82.3    3254       4      10    2677
## 7 CSF_02   7974     84.2    97.6    81.8    81.8    5087       5      10    4161
## # ... with 13 more variables: Paired_clonotype_diversity <dbl>,
## #   Number_of_reads <dbl>, Number_of_short_reads_skipped <dbl>,
## #   Q30_RNA_read <dbl>, Q30_UMI <dbl>, Q30_barcodes <dbl>,
## #   Fraction_reads_in_cells <dbl>, Mean_reads_per_cell <dbl>,
## #   Mean_used_reads_per_cell <dbl>, Reads_mapped_to_TRA <dbl>,
## #   Reads_mapped_to_TRB <dbl>, Reads_mapped_to_any_V_D_J_gene <dbl>,
## #   Valid_barcodes <dbl>, and abbreviated variable names 1: Subproject, ...
## [1] "VDJ-B QC summary table"
## # A tibble: 5 x 27
##   Subpro~1 GemID Cells~2 Cells~3 Cells~4 Cells~5 Cells~6 Cells~7 Estim~8 Media~9
##   <chr>    <chr>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
## 1 CSF_01   4608      0     100       0       0       0       0         2     6  
## 2 CSF_01   4839    100      61.5    38.5    61.5    38.5   100        13  1995  
## 3 CSF_01   5929     93.5    67.4    33.1    63.0    30.9    93.5     414  2186. 
## 4 CSF_02   7921     93.3    57.5    42.5    53.0    40.3    93.3     134    10.5
## 5 CSF_02   7974     71.7    61.0    38.5    44.3    26.9    70.8    1852    12  
## # ... with 17 more variables: Median_IGK_UMIs_per_Cell <dbl>,
## #   Number_of_cells_with_productive_V_J_spanning_pair <dbl>,
## #   Paired_clonotype_diversity <dbl>, Number_of_reads <dbl>,
## #   Number_of_short_reads_skipped <dbl>, Q30_RNA_read <dbl>, Q30_UMI <dbl>,
## #   Q30_barcodes <dbl>, Fraction_reads_in_cells <dbl>,
## #   Mean_reads_per_cell <dbl>, Mean_used_reads_per_cell <dbl>,
## #   Reads_mapped_to_IGH <dbl>, Reads_mapped_to_IGK <dbl>, ...

Back to top